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Abstract —Based on the Aristotelian concept of potentiality 
vs. actuality allowing for the study of energy and dynamics in 
language, we propose a held approach to lexical analysis. Falling 
back on the distributional hypothesis to statistically model word 
meaning, we used evolving fields as a metaphor to express time- 
dependent changes in a vector space model by a combination 
of random indexing and evolving self-organizing maps (ESOM). 
To monitor semantic drifts within the observation period, an 
experiment was carried out on the term space of a collection of 
12.8 million Amazon book reviews. For evaluation, the semantic 
consistency of ESOM term clusters was compared with their 
respective neighbourhoods in WordNet, and contrasted with dis¬ 
tances among term vectors by random indexing. We found that at 
0.05 level of significance, the terms in the clusters showed a high 
level of semantic consistency. Tracking the drift of distributional 
patterns in the term space across time periods, we found that 
consistency decreased, but not at a statistically significant level. 
Our method is highly scalable, with interpretations in philosophy. 

I. Introduction 

The modeling of semantic content for statistical analysis, 
prominently by means of computational and theoretical lin¬ 
guistics, has been quietly inspired by physics and chemistry 
over the past two decades. Strictly on a metaphoric basis, 
the idea was to compare language as a rule-based system 
to domains of natural science as like systems for innovative 
model design. Such endeavours typically go back to two kinds 
of physical phenomena, i.e. attraction acting on its own like 
in gravity, a force with non-polar roots, vs. a system of attrac¬ 
tion and repulsion based in polarity like electromagnetism. 
Both word meaning and sentence meaning show statistical 
behaviour compliant with the idea of non-polar [15], [14] vs. 
polar [5], [80], [57] binding forces, allowing for latent analytic 
thinking for category building in many areas, including natural 
language processing [65], bioinformatics networks [45], [79], 
quantum theory [6], or chemometry [10]. Clearly, similarity 
of meaning as an attractor vs. difference of meaning as a 
repellent are organizing principles of conceptual processing 
one cannot ignore, and an interesting way ahead is to explore 
the implications of this observation. 

Below we will consider word semantics as the “behaviour” 
of linguistic signs of a dual nature, i.e. intertwined form and 
content, leading to the emergence of conceptual categories 
over objects, and ultimately to the applicability of artificial 
neural networks [38] for machine learning. Mathematical 


“energy” [8], [72] and machine learning are related, the latter 
often being based on minimizing a constrained multivariate 
function such as a loss function. Concepts in feature space 
“sit” at energy minima, representing the cost of a classification 
decision as an energy minimizing process. This suggests that 
machine learning must identify concepts with such minima, 
and since potential energy in physics is carried by a field 
or a respective topological mapping, concepts naturally have 
something to do with energy as work capacity. As this general 
process is practically isomorphic with the theory of reaction 
paths over a potential hypersurface leading to the proba¬ 
bilistic composition of chemical compounds in computational 
chemistry [50], [43], we believe that evolving fields as a 
metaphor to simulate category formation by semantic content 
is a legitimate approach. Furthermore, attractor networks [1], 
[14] establishing a quasi-continuous field [75] and capable of 
processing both word and sentence meaning link the above 
considerations with the study of neural networks. 

Key to our current line of thought is the semantic continuity 
hypothesis , i.e. the assumption that any vocabulary modelled 
by term space consists of both actual and potential word 
content, the former mapped to observable locations, the latter 
filling in the so-called “lexical gaps” between them. Linguis¬ 
tics offers innumerable examples for the existence of such gaps 
where a language lacks spelled out, i.e. actualized content in 
contrast to another one. This continuity is best modelled as 
an evolving field, with both actual and potential word content 
constantly dislocated over time. Due to such dislocations, both 
the actual positions and their embedding potential contexts 
may change, offering a rich texture of semantic substance 
quasi charging actual term locations vs. discharged potential 
ones. The same line of thought applies to the vector space 
model of sentence content [13]. Given timestamped data, one 
can measure such dislocations, called the semantic drift , an 
important indicator of ongoing language change [3], [19], 
prominently affecting the monitoring of novelties in document 
indexing terminology. 

We aim to demonstrate the following objectives: 

1) We are interested in evaluating semantic consistency 
within single time periods of an evolving data set. 

2) We would like to see if semantic drift can be detected 
by analysing the change in semantic consistency. 



II. Background 

In what follows we introduce four considerations leading to 
our methodology underlying the experiment design. 

A. Semantic similarity 

As object or feature categorization by neural networks 
depends on the concept of similarity as a fundamental “binding 
force”, we briefly review measures of semantic relatedness 
(MSR) to express thematic coherence [68]. In linguistics 
relevant for text processing, there are two prominent theories 
of word meaning, the distributional hypothesis [33], and the 
referential theory of word semantics [25]. According to the 
first, meaning depends on word use, i.e. is contextual, whereas 
for the second one, it is referential, i.e. goes back to convention 
expressed e.g. by definitions in ontology entries. Because 
habitual word use as context clearly implies agreements about 
the sense in which certain word forms are being used in certain 
contexts, there is a dependency between the two approaches. 

Automated systems assign a score of semantic relatedness 
to a given pair of terms calculated from a relatedness measure. 
The absolute score itself is typically irrelevant on its own; what 
is important is that the measure assigns a higher score to term 
pairs which humans think are more related and comparatively 
lower scores to term pairs that are less related [52], 

Distributional similarity and its predecessors go back a long 
way, building on the notion of term dependency and struc¬ 
tures derived therefrom [53]. The underlying distributional 
hypothesis is often cited for explaining how word meaning 
enters information processing [37]. Before attempts to utilize 
lexical resources for the same purpose, this used to be the sole 
source of word semantics in information retrieval, inherent in 
the exploitation of term occurrences - most notably, in the 
term frequency-inverse document frequency (TFIDF) measure 
- and co-occurrences [26], [56], [63], including multiple-level 
term co-occurrences [39]. On the other hand, the referential 
approach relies these days on lexical resources. A lexical re¬ 
source in computer science is a structure that captures semantic 
relations among terms, quasi “charging” word occurrences in 
context with external information. 

The reason for combining the two approaches is that statis¬ 
tical techniques typically suffer from the sparse data problem: 
they perform poorly when the terms are relatively rare. Hybrid 
methods attempt to address this problem by supplementing 
sparse data with information from a lexical database [59], [35]. 
In a semantic network, to differentiate between the weights of 
edges connecting a node and all its child nodes, one needs 
to consider the link strength of each specific child link. This 
is a situation in which corpus statistics can contribute. The 
following types of resources are commonly used in measuring 
semantic similarity between terms: dictionaries [42], semantic 
networks, such as WordNet [51], thesauri modelled on Roget’s 
Thesaurus [54], and ontologies. 

B. Semantic fields 

We find the tradition of using a combination of two planes 
to describe a phenomenon in several disciplines. E.g. the 


general practice of evaluating the effectiveness of information 
retrieval and text categorization models by measures like 
recall, precision, accuracy, and many more [68], There is 
ongoing work to build semantic spaces from distributional vs. 
compositional semantics [55], [23], representing both word 
and sentence meaning as locations in high-dimensional space 
where for phrase or sentence component binding, recursive 
matrix-vector spaces [64], the tensor product [4], [6], [30], 
or circular holographic reduced representation are routinely 
used [13]. In these models, the representation of semantic 
content in documents is compared to an ideal state of language 
use, provided by the human standards of interpretation inherent 
in the evaluation method [22]. Using geometry or probability 
as a vehicle of meaning, i.e. building a new medium of 
language, aims at maximizing similarity between the human 
standard and its statistical reconstruction. This hypothetic 
original, a correlate of spoken language called a mental state 
or internal state in neuroscience [21], recalls the “language of 
thought hypothesis” in philosophy [24], also called mentalese. 
A joint element in the above is that whereas language as a 
mental phenomenon is assumed to be continuous, its uttered 
or mathematically modelled representations are discrete. 

The same duplicity returns as “hidden metaphysics” in 
traditional mentalist and more recent generalist-universalist 
theories about language: language is but a tool operated by 
something deeper - thought, reason, logic, cognition - which 
functions in line with biological-neurological mechanisms 
common to all human beings [34]. Moreover, a linguistic 
school of thought orthogonal to the above theories, called Neo- 
Humboldtian field theories of word meaning, goes back to 
the same dual model where discrete distributions of related 
content called lexical or semantic fields, based on language 
use, are underpinned by the assumption of conceptual fields 
in the mind. Then, the lexical field of related words is only 
an outward manifestation of the underlying conceptual field 
so that the sum total of conceptual fields describes one’s 
world view [67]. In yet another unrelated school of thought, 
Saussure’s structural linguistics, language ( langue ) is a mental 
grammar with a rule set specifying ideal content pronunciation, 
whereas speech (parole ) stands for the exemplification of those 
rules [16]. 

An important symptom of lexical fields is that regions 
of related content are separated by lexical gaps. These are 
nonexistent names for things where one could exist by rules 
of a particular language, and indicate possible conceptual 
distinctions not mapped to actual language use, such as 
mother’s father (Swedish morfar) vs. father’s father (Swedish 
farfar), both called grandfather in English, or father’s brother 
(Swedish farbror ) not distinguished from mother’s brother 
(Swedish morbror ), both called uncle. Such language-specific 
discontinuities of semantic content play a prominent role in 
our methodology. 

The assumption that products of the mind are continuous 
while their mapping to spoken language is discrete goes back 
ultimately to Aristotle’s Metaphysics. In this, existence or 
reality is described as the sum total of two components. 



conceivable potentiality ( dynamis ) plus observable-measurable 
actuality ( energeia ). These are names for the latent vs. man¬ 
ifest capacity of existents to induce change. Therefore in our 
current thinking, existence consists of two layers, potentiality 
(a continuum) and actuality (a discrete distribution sampling 
the former). Importantly, one ascribes a field nature to mental 
experience because of the potentiality layer which we indi¬ 
rectly perceive by the actualized values of events. 

C. Measuring semantic consistency 

For any model departing from the idea of similarity between 
instances in a semantic field, a logical next question is, how 
coherent are the groups of instances in that field? Relating term 
similarity and semantic consistency, the domain restriction 
hypothesis answers that question [28], Based on the filtering 
away of extracted but false sense relations, semantically related 
terms extracted from a corpus tend to be semantically coherent. 
To this end, semantic domains are used as filters by integrating 
pattern-based and distributional approaches to capture two 
characteristic properties of semantic relations: 

• Syntagmatic properties: if two terms X and Y are in a 
given relation, they tend to co-occur in texts, and are 
mostly connected by specific lexical-syntactic patterns 
(e.g., the pattern “X is a Y” connects terms in is-a 
relations). This aspect is captured using a pattern-based 
approach; 

• Domain properties: if a semantic relation among two 
terms X and Y holds, both X and Y should belong to 
the same semantic domain (i.e. they are semantically 
coherent), where semantic domains are sets of terms 
characterized by very similar distributional properties in 
a (possibly domain specific) corpus. 

This approach is detailed in [27]. On the other hand, in 
a recent reincarnation, semantic consistency is a new distant 
supervision method which can identify reliable instances from 
noisy instances by inspecting whether an instance is located 
in a semantically consistent region. One way to find out is 
to first model the local subspace around an instance as a 
sparse linear combination of training instances, then estimate 
the semantic consistency by exploiting the characteristics of 
the local subspace [32], 

D. Semantic drifts 

In the context of Semantic Web dynamics [2], there is 
a growing body of literature about the semantic drift [40], 
[31], the language-related version of abrupt parameter value 
changes in data mining called concept drifts [18], [73], [61], 
[29], By semantic drift we mean how the features of ontology 
concepts gradually change as their knowledge domain evolves, 
or, alternatively, how different user communities reinterpret 
the same concept in a different context so that the risk is 
having these concepts lose their rhetorical, descriptive and 
applicative power [11], In a more general sense, the topic 
is important beyond its linguistic implications, especially for 
managing semantic interoperability for federations; respective 
research to date has focused on the generation of semantic 


mappings and has tended to ignore the problem of dealing 
with the dynamism of both the data and the schemata that is 
characteristic of real-world integration problems [7]. 

III. Methodology 

We build a vector space model by random indexing that 
is able to closely track the changes of an evolving text 
collection. We project the space to a two-dimensional surface 
where clusters and shifts are more apparent by emergent self¬ 
organizing maps; this projection preserves the local topology 
of the high-dimensional space and allows us to model dynamic 
semantic fields. We use WordNet-based referential similarity 
measures to evaluate semantic consistency and also to detect 
semantic drifts over time. Refer to Figure 1 for an outline. 

A. Distributional similarity and random indexing 

We build a TFIDF vector space model of the corpus, 
which provides the foundation for most distributional semantic 
distance measures. The basic TFIDF space is known to be 
extremely sparse, having 1-5 % nonzero elements. Latent 
semantic analysis, or latent semantic indexing [17] measures 
semantic information through co-occurrence analysis in the 
corpus, but it reduces the dimensionality and solves the 
problem of sparsity. The dimension of the vector space is 
reduced by singular value decomposition. 

Random indexing is a similar idea which does not rely 
on the use of computationally intensive matrix decomposi¬ 
tion. This makes random indexing a much more scalable 
technique in practice. Instead of first constructing a huge 
co-occurrence matrix and then using a separate dimension 
reduction phase, random projection builds an incremental word 
space model [36]. The random projection technique can be 
described as a three-step operation: 

• First, each document in the corpus is assigned a unique 
and randomly generated representation called an index 
vector. These index vectors are sparse, high-dimensional, 
and ternary, which means that their dimensionality (d) is 
on the order of hundreds, and that they consist of a small 
number of randomly distributed values, with the rest of 
the elements of the vectors set to 0. 

• Then, context vectors are produced by scanning through 
the text, and each time a word occurs in a context (e.g. 
in a document, or within a sliding context window), 
that context’s (-/-dimensional index vector is added to 
the context vector for the word in question. Words are 
thus represented by (/-dimensional context vectors that 
are effectively the sum of the words’ contexts. 

Comparing the term vectors in the random indexed space by a 
similarity measure such as the Euclidean distance or cosine 
dissimilarity enables a quantitative framework for semantic 
analysis. 

B. Semantic fields in emergent self-organizing maps 

The vectors of the term space are disjoint locations in a 
high-dimensional space. A field, on the other hand, is defined 
at all points in a space. To bridge this problem, we embed 




Fig. 1. Outline of methodology. 


the vector space on a two-dimensional surface using emergent 
self-organizing maps [76]. 

A self-organizing map is a two-dimensional grid of artificial 
neurons. Each neuron is associated with a weight vector 
that matches the dimension of the training data. We take an 
instance of the training data, find the closest weight vector, 
and pull it closer to the data instance. We also pull the weight 
vectors of nearby neurons closer to the data instance, with 
decreasing weight as we get further from the best matching 
unit. We repeat this procedure with every training instance. 
This constitutes one epoch. We repeat the same process in 
the second epoch, but with a smaller neighbourhood radius, 
and a lower learning rate when adjusting the weight vectors. 
Eventually, the neighbourhood function decreases to an extent 
that training might stop. The time needed to train an SOM 
grows linearly with the data set size, and it grows linearly 
with the number of neurons in the SOM. The resulting network 
reflects the local topology of the high-dimensional space [38]. 
Emergent self-organizing maps contain a much larger number 
of target nodes for embedding, and thus capture the topology 
of the original space more accurately [70]. Using a toroid map 
avoids edge effects. 

Some nodes of the emergent self-organizing map correspond 
to one or more terms; these nodes or best matching units have 
a special role as they identify semantic content with one or 
more terms. The rest of the nodes act as an interpolation of 
the semantic held. Since the held is continuous in nature, we 
use toroid maps - a planar map would introduce an artificial 
discrete cut-off at the edges. 

C. WordNet-based similarity metrics and validating consis¬ 
tency 

WordNet (WN) is a large lexical database of English, 
created and maintained by Princeton University [51]. It is 
publicly available to research and commercial users free of 
charge and its latest version is 3.1 (released Nov’12). Word- 
Net’s popularity arguably lies on the fact that, besides merely 
offering short definitions and usage examples of the contained 
nouns, verbs, adverbs and adjectives, it also introduces certain 
types of semantic relations between terms. Examples of such 
relations include synonymy, hyper/hyponymy (is- a relation¬ 


ship), meronymy (part-of relationship) etc. WN can, thus, be 
viewed as a combination of a dictionary and a thesaurus. 

Due to its structure described above, WN has been exten¬ 
sively deployed in tasks related to determining the semantic 
similarity between terms, primarily in automatic text analysis 
and artihcial intelligence applications. Towards this direction, 
many semantic similarity metrics have been proposed, which 
can be grouped into four different categories [49]: path-based, 
content-based, feature-based and hybrid metrics. 

• In path-based metrics the similarity between two terms 
depends on their relative position in the WN taxonomy 
as well as on the length of the path linking the concepts. 
Representative examples deploying path-based measures 
include [9], [41], [44] and [78], 

• Content-based metrics are based on the information 
content available for each concept in WN. The more 
common information two concepts share, the more sim¬ 
ilar they are. Examples belonging to this category in¬ 
clude [35], [46], [47] and [59]. 

• Feature-based metrics are based on the properties of the 
WN ontology for obtaining a similarity value. The more 
common (and the fewer non-common) characteristics two 
concepts have (e.g. their definitions or “glosses” in WN), 
the more similar they are. Relationships to other similar 
terms in the taxonomy are also taken into consideration. 
Some related approaches are the classical model proposed 
by Tversky [69] or the more recent approach presented 
in [62]. 

• Finally, the hybrid metrics combine the ideas presented 
above. The following constitute paradigms of applying 
hybrid similarity metrics: [20], [60] and [81]. 

Apart from the above distinction, Varelas et al. also dis¬ 
tinguish semantic similarity measures in single ontology and 
cross ontology methods [71]; the former assume that the terms 
compared all come from the same reference ontology, while 
the latter compare terms from two different ontologies. Since 
it is not easy to directly compare the structure and information 
content of different ontologies, the case of cross ontology 
similarity typically employs hybrid or feature-based methods 
(e.g. see [60] and [44]). 














D. Quantifying drifts 

The foundation for measuring the drift is based on random 
indexing of subsequent TFIDF spaces with a fixed random 
seed. By following this method, we are able to derive subse¬ 
quent low-dimensional spaces which can be compared against 
one another. We train an emergent self-organizing map on 
each of these term vector spaces: after the first period, we 
continue training the map with a lower learning rate to arrive 
at smoothly changing dynamics. 

As a potential source of confusion, we point out that the 
time-like variable of the iterations and the epochs in the 
training of the ESOM are unrelated to the temporal aspects 
of the corpus. By time we always refer to the time related to 
the corpus, and by a period we refer to documents belonging 
to a certain time interval of the corpus. Epoch, on the other 
hand, refers to the training rounds of the ESOM. 

IV. Experiment design 

The experiments were based on a large text corpus on 
which the TFIDF spaces and random indices were created. 
The random indices were used to generate the sequence of 
self-organizing maps. We analysed the topology of these maps 
for consistency and drifts. 

A. Corpus 

Book reviews are the literary genre equivalent of abstracts 
to scientific articles, cross-pollinated by the idea of crowd¬ 
sourcing underlying recommender systems [58], i.e. one sum¬ 
mary per article produced per one professional abstracting and 
indexing service vs. potentially many summaries of the same 
item, written by users as part professional, part lay contribu¬ 
tors. In a sense both approaches represent user feedback. From 
a methodological perspective, due to the nature of condensed 
semantic content in them, book reviews processing falls in 
the category of e.g. text summarization [12] on the one hand, 
combined with sentiment analysis [66] on the other hand. 
Due to this blend, they represent an interesting and scalable 
resource of complex semantic content for “neuromorphic” 
studies. 

The experiments described here were based on Stanford’s 
Amazon book reviews data set [48], which is publicly available 
as part of the University’s SNAP project 1 . The data set spanned 
a period of 18 years and included approximately 12.8M book 
reviews up to March 2013. Every item in the data set included 
product and user information, ratings, as well as a plain text 
content description. The collection contained 51 degenerate 
time stamps, the corresponding instances were discarded. 

Period 1 Period 2 Period 3 

Until 30 Jan 2003 Until 03 Aug 2008 Until 04 Mar 2013 
45162 terms 49400 terms 50672 terms 

TABLE I 

Key statistics of the temporal split of the corpus. 

1 http://snap.stanford.edu/index.html. last accessed: Jan'15. 


We split the corpus in three periods, each containing close 
to 4.3M reviews. The key characteristics are summarised in 
Table I. 

B. Computational background 

We used Lucene 2 , which is an information retrieval software 
library that builds an inverted index, which can be interpreted 
as a row-major sparse representation of a term-document ma¬ 
trix. We used the Semantic Vectors package [74] for reducing 
the dimensionality of the space. For training the emergent self¬ 
organizing maps, we used Somoclu [77], 

The implementation of the semantic similarity metrics was 
based on WS4J 3 , a Java API for several of the semantic 
relatedness/similarity algorithms presented in section III-C. 
WS4J works over WordNet 3.0 and constitutes an improved 
version of the Perl-based WordNet-Similarity-2.05 4 . In the 
experiments described subsequently, we deployed a repre¬ 
sentative path-based semantic similarity method (Wu and 
Palmer’s [78]), while in the imminent future we plan to 
investigate the behaviour of more methods (path-based and 
content-based). Note that all experiments are open source 5 . 

C. Consistency in a single time period 

This subsection is aimed at assessing whether proximity 
of terms in the toroid plane indicates a strong underlying 
semantic similarity. Our focus lied on neurons that had more 
than 1 term assigned to them and we were interested in 
the average similarities within these neurons. Towards this 
direction and in order to evaluate the consistency of our 
approach in the case of a single time period, initially we 
randomly divided the terms of the period into clusters of 5 
terms each (5 is the average count of terms assigned to non¬ 
empty neurons in the toroid plane). For each of the groups, we 
computed the average semantic similarity; then, based on all 
the derived average similarities, we determined their empirical 
probability distribution. The latter was a good approximation 
of a normal distribution. This enabled us to make use of 
the f-test for evaluating the significance of the similarity, as 
described subsequently. 

The key idea was that we assumed that the terms within each 
neuron constituted a random group from the total population. 
Hence, the average similarities within the neurons followed 
the normal distribution with mean equal to the empirical 
mean (po) from the above distribution and variance equal 
to its empirical variance (<7q). Based on this assumption 
(Hq : n = no), we performed a 1-sided r-test for a predefined 
level of significance (a), in order to assess whether the average 
within each neuron was statistically significantly greater than 
this empirical mean (H\ : // > p, 0 ). We considered three 
generic cases: (a) neurons containing 3 or more terms, (b) 
neurons containing 5 or more terms, and, finally, (c) neurons 
containing 10 or more terms. For each of the three cases, 

"http://lucene.apache.org/, last accessed: Jan'15 

4 https://code.google.com/p/ws4j/, last accessed: Jan’15. 

4 http://wn-similarity.sourceforge.net/, last accessed: Feb'15. 

5 https://github.com/peterwittek/concept_drifts 




Period 1 

Period 2 

Period 3 | 

Term count 

a = 0.05 

a = 0.1 

a = 0.2 

a = 0.05 

a = 0.1 

a = 0.2 

a = 0.05 

a = 0.1 

a = 0.2 

N > 3 

0.287 

0.334 

0.404 

0.269 

0.320 

0.406 

0.259 

0.316 

0.400 

N > 5 

0.339 

0.380 

0.429 

0.327 

0.366 

0.421 

0.304 

0.353 

0.410 

N > 10 

0.383 

0.416 

0.453 

0.372 

0.401 

0.448 

0.332 

0.380 

0.431 


TABLE II 

Percentages of rejection of the null hypothesis for the j-test for each period and for different neuron populations. 


Periods 

a = 0.05 

a = 0.1 

a = 0.2 

3 — pi — p2 

0.051 

0.130 

0.860 

3 — p2 — p3 

0.211 

0.614 

0.505 

5 — pi — p2 

0.330 

0.251 

0.515 

5 - p2 - p3 

0.051 

0.301 

0.351 

10 — pi — p2 

0.318 

0.494 

0.833 

10 — p2 — p3 

0.060 

0.344 

0.451 


TABLE III 

P-VALUES OF THE COMPARISON OF PERCENTAGES DISPLAYED IN TABLE II BETWEEN CONSECUTIVE PERIODS AND FOR DIFFERENT NEURON 

POPULATIONS. 


we repeatedly performed the 1-sided f-test for every neuron 
and calculated the percentage ( p ) of the cases where we 
could reject the null hypothesis. A percentage greater than 
a indicated that the number of samples with high average 
similarity was much greater than expected, based on the 
assumption that the average similarities followed the normal 
distribution N (/j u , oq). 

Consequently, in order to assess whether this percentage was 
indeed anticipated or not, we performed a one-sided binomial 
test (Hq : p = a, Hi : p > a) at 0.05 level of significance, 
for comparing this percentage with the level of significance of 
the f-test. In the cases when // (J was rejected, we deduced that 
both p and the overall level of similarity of terms within the 
neurons were statistically significantly greater than expected, 
based on our key initial assumption. This meant that terms 
within a neuron demonstrated greater similarity in comparison 
to a random group of terms, indicating that this grouping made 
sense. Table II displays the percentages p for each of the three 
cases and for different levels of significance for each of the 
three periods (see next subsection). The p-values derived from 
the application of the binomial test are not displayed, since 
they are all negligible. This holds because the percentage is 
much greater than a in every case. 

As an example, consider the following neurons: 
ea?i = {thymine, cytosine, uracil} and ex-i = 
{pair, harvard, Scorpio, monsignor, misrepresentation}. 
Intuitively, ex\ displays apparent semantic similarity (all 
terms are nucleobases), in contrast to the terms included in 
ex 2 that do not. We would expect that H 0 : /./ = /j 0 would 
be rejected for ex\ and not rejected for ex->, which is indeed 
the case. However, there are some cases when the semantic 
similarity of terms within a neuron is not that apparent, 
but the null hypothesis is again rejected. This is due to the 
chosen path-based semantic similarity metric [78]; thus, we 
leave as future work the application of further path-based and 
information content-based methods. All in all, the robustness 


of this statistical approach is based on our initial assumption 
that the population of averages followed a normal distribution 
which we tested empirically. However, we believe this is the 
groundwork for interesting future investigations. 

D. Dynamics of semantic consistency 

We were interested to find out if over several periods 
the overall similarity within each neuron converges. Our 
experiment in this paper involved three periods and for each 
of them we repeated the process described in the previous 
subsection. We calculated the percentages of the cases where 
we rejected the null hypothesis of the f-test for all three 
periods, as displayed in Table II. As observed in the table, 
there is a slight decrease of the percentages from period 
to period. To assess whether this decrease is important, we 
performed a test for proportional comparison at a = 0.05. 
The resulting p-values are shown in Table III and indicate 
that this decrease is not statistically significant in every case. 
This means that there are indeed some slight differences that 
do not demonstrate divergence. More periods would be needed 
in order to investigate whether there is convergence, which we 
will study in the imminent future. 

V. Conclusion and future work 

Recently increased attention has been paid to models of 
evolving semantic content, something we represented as a dy¬ 
namic vector field. This approach was based on the hypothesis 
of semantic continuity in the vocabulary, allowing both for 
manifest (actual) and latent (potential) word content mapped 
to lexical forms, with word meaning in term space behaving 
like “energy” while constructing conceptual categories. In spite 
of the simple design to analyse book reviews over 18 years in 
just three periods, the f-test confirmed that the overall level 
of similarity of terms within the ESOM grid neurons was 
significantly higher than expected, i.e. neuron content was 
statistically consistent. 




We plan to continue this line of research in different 
directions, including the following: 

• Increase the number of periods to compute more detailed 
visual maps over shorter time spans; 

• Increase grid granularity to reduce term overlap on neu¬ 
rons; 

• Upgrade this model by introducing reflexive random 
indexing to smoothen transition between periods; 

• Interpret the vector field model in terms of e.g. process 
philosophy. 
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